Skip to content

[0.9.1][Feature]Moe alltoallv communication optimization for unquantized RL training sence & alltoallv support dpo #1547

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 68 commits into from
Jul 14, 2025

Conversation

weijinqian0
Copy link
Contributor

@weijinqian0 weijinqian0 commented Jul 1, 2025

[Feature]Moe alltoallv communication optimization for unquantized RL training sence & alltoallv support dpo

Introduction

This PR introduces two key optimizations for MoE model performance:

  1. Efficient Token Dispatcher:

    • Implements an optimized alltoallv_seq token dispatcher (adopted from NVIDIA Megatron and Ascend MindSpeed)
    • Significantly more efficient than current alltoall implementation when using token_permute/unpermute fusion
    • Enable with: VLLM_ASCEND_ENABLE_MOE_ALL2ALL_SEQ=1
  2. DBO Support for alltoallv_seq:

    • Builds upon the alltoallv_seq dispatcher to support DBO (Dual Batch Overlap)
    • Enables overlapping of alltoallv communication during the prefilling stage
    • Enable with both:
      • VLLM_ASCEND_ENABLE_MOE_ALL2ALL_SEQ=1
      • VLLM_ASCEND_ENABLE_DBO=1

Performance Improvements

Testing on Qwen3-30B-A3B shows nearly 2x throughput improvement compared to the original alltoall implementation.

weijinqian_v1 added 12 commits July 1, 2025 09:51
…training sence & alltoallv support dpo

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
…training sence & alltoallv support dpo

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
…training sence & alltoallv support dpo

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
…training sence & alltoallv support dpo

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
…training sence & alltoallv support dpo

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
…training sence & alltoallv support dpo

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
…training sence & alltoallv support dpo

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
…training sence & alltoallv support dpo

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
…training sence & alltoallv support dpo

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
…training sence & alltoallv support dpo

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
…training sence & alltoallv support dpo

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
…training sence & alltoallv support dpo

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
weijinqian_v1 added 3 commits July 1, 2025 14:03
…training sence & alltoallv support dpo

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
…training sence & alltoallv support dpo

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
…training sence & alltoallv support dpo

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Copy link

github-actions bot commented Jul 3, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Copy link

github-actions bot commented Jul 8, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

weijinqian_v1 and others added 5 commits July 9, 2025 16:25
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
harygo22 added 2 commits July 11, 2025 14:34
Signed-off-by: duyangkai <duyangkai@huawei.com>
Signed-off-by: duyangkai <duyangkai@huawei.com>
# at different points based on MoE settings as late as possible.
# Valid sync points are "before_permutation_1", "before_ep_alltoall",
# "before_finish", and "no_sync".
self.cuda_sync_point = "no_sync"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use this naming, seems little bit unsuitable in vllm-ascend

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cuda_sync_point is already renamed to device_sync_point

harygo22 and others added 5 commits July 11, 2025 17:32
Signed-off-by: duyangkai <duyangkai@huawei.com>
… into v0.9.1-dev

# Conflicts:
#	tests/ut/test_distributed_tensor_parallel.py
#	tests/ut/test_moe_util.py
#	tests/ut/test_token_dispatcher.py
#	vllm_ascend/ascend_forward_context.py
#	vllm_ascend/envs.py
#	vllm_ascend/models/moe_block.py
#	vllm_ascend/models/qwen3_dbo.py
#	vllm_ascend/ops/fused_moe.py
#	vllm_ascend/ops/moe_dispatcher/token_dispatcher.py
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

weijinqian added 11 commits July 12, 2025 23:43
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
weijinqian_v1 added 2 commits July 12, 2025 23:44
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
weijinqian_v1 and others added 2 commits July 12, 2025 23:52
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: duyangkai <duyangkai@huawei.com>
@ganyi1996ppo ganyi1996ppo merged commit 63944db into vllm-project:v0.9.1-dev Jul 14, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants